Filtering Bio-sequence Based on Sequence Descriptor

نویسندگان

  • Te-Wen Hsieh
  • Huang-Cheng Kuo
  • Jen-Peng Huang
چکیده

Study on biological sequence database similarity searching has received substantial attention in the past decade, especially after the sequencing of the human genome. As a result, with larger and larger increases in database sizes, fast similarity search is becoming an important issue. Transforming sequences into numerical vectors, called sequence descriptors, for storing in a multidimensional data structure is becoming a promising method for indexing bio-sequences. In this paper, we present an effective sequence transformation method, called SD (Sequence Descriptor) which uses multiple features of a sequence including Count, RPD (Relative Position Dispersion), and APD (Absolute Position Dispersion) to represent the original sequence data. In contrast to the q-gram transformation method, this avoids the problem of exponentially growing vector size. Also, we present a transformation, called ST (Segment Transformation), which recursively divides sequence data into equal length subsequences, and concatenates them after transformation of the subsequences. Experiments on human genome data show that our transformation method is more effective than the q-gram transformation method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of Consecutive Separating Arrangements of Bio active Compounds from Black Tea (Camellia sinensis) Residue

Every year lots of black tea (Camellia sinensis (L.) Kuntze) residue will produce in the factories. These residue are unusable whereas the bio active compounds can be extracted and used in the drag and food industries. Due to mentioned problems, this project was conducted years 2011 - 2012 with the aim to make a study on consecutive isolation of all bio active compounds from tea residu...

متن کامل

Effect of Rating Time for Cold Start Problem in Collaborative Filtering

Cold start is one of the main challenges in recommender systems. Solving sparsechallenge of cold start users is hard. More cold start users and items are new. Sine many general methods for recommender systems has over fittingon cold start users and items, so recommendation to new users and items is important and hard duty. In this work to overcome sparse problem, we present a new method for rec...

متن کامل

DPML-Risk: An Efficient Algorithm for Image Registration

Targets and objects registration and tracking in a sequence of images play an important role in various areas. One of the methods in image registration is feature-based algorithm which is accomplished in two steps. The first step includes finding features of sensed and reference images. In this step, a scale space is used to reduce the sensitivity of detected features to the scale changes. Afterw...

متن کامل

Multiple Proteins Sequence Alignment Based on Progressive Methods with New Guide Tree

Multiple proteins sequence alignment is one of the important research topics of bioinformatics. In multiple sequence alignment, it is emphasized to find optimal alignment for a group of sequences. All sequences are constituted of residues i.e. nucleotides for DNA/RNA, or amino acids for proteins. The objective is to maximize the similarities between them by adding and shuffling gaps. To do this...

متن کامل

Earliest Campanian - latest Maastrichtian sequence stratigraphy based on planktonic foraminifera, Fars province, Zagros, Iran

The Gurpi, Tarbur and Sachun formations have been investigated in the studied section in the Fars Province, in order to determine their sequence stratigraphy. On the basis of done studies on the cores of borehole, four main microfacies have been recognized in four stratigraphic sequence deposited during the Campanian to Maastrichtian. The lowermost sequence, was deposited in the early Campanian...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006